Test your hypotheses using informative data visualizations
Skipping to the end
How did we do this?
ggplot(data = mpg, mapping =aes(x = displ, y = hwy)) +geom_point(mapping =aes(colour = class)) +geom_smooth(method ="lm") +theme(legend.position ="bottom",panel.grid =element_blank(),panel.background =element_blank(),plot.title.position ="plot",plot.title =element_text(face ="bold") ) +labs(title ="Engine displacement and highway miles per gallon",subtitle ="Values for seven different classes of cars",x ="Engine displacement (L)",y ="Highway miles per gallon" ) +scale_color_colorblind()
EXERCISE
How many rows are in mpg? How many columns?
nrow(mpg)ncol(mpg)
What does the drv variable describe?
?mpg
EXERCISE
Make a scatterplot of hwy vs cyl.
What happens if you make a scatterplot of class vs drv? Why is the plot not useful?
Which geom might be a better choice?
EXERCISE
Why does the following give an error and how would you fix it?
ggplot(data = mpg) +geom_point()
Add the following caption to the plot you made in the previous exercise: “Data come from the ggplot2 package.” HINT: Look at the documentation for labs().
Flexible visualization
You can use visual elements to communicate your findings in engaging ways.
ggplot(data = mpg) +geom_point(mapping =aes(x = displ, y = hwy, color = class =="2seater"))
What’s gone wrong with this code? Why are the points not blue?
ggplot(data = mpg) +geom_point(mapping =aes(x = displ, y = hwy, color ="blue"))
EXERCISE
Which variables in mpg are categorical? Which variables are continuous?
Map a continuous variable to color, size, and shape. How do these aesthetics behave differently for categorical vs. continuous variables?
What happens if you map the same variable to multiple aesthetics?
Let’s clean our graph up
Less is more when it comes to data visualization.
ggplot(data = mpg, mapping =aes(x = displ, y = hwy)) +geom_point(mapping =aes(colour = class)) +geom_smooth(method ="lm") +theme_minimal() +labs(title ="Engine displacement and highway miles per gallon",subtitle ="Values for seven different classes of cars",x ="Engine displacement (L)",y ="Highway miles per gallon" ) +scale_color_colorblind()
Connected these relationships or trends to your expectations (or hypotheses about the data)
HOMEWORK
In the final session, you will apply the skills you will learn over the next few days to a problem that interests you. To prepare for this, you need to find a data set that: